PDF Page Color Counter
PDF Page Color Counter
🛠️ Description
This Python project provides a simple yet powerful tool for analyzing PDF documents and counting the number of black and color pages. Whether you’re working on document analysis, quality control, or just curious about the composition of your PDF files, this code helps you gain insights into the document’s visual characteristics.
Key Features:
Easy Integration: With a few lines of code, you can integrate this functionality into your Python applications or workflows.
PDF Expertise: Utilizing the PyMuPDF (MuPDF) library, this project efficiently processes PDF files, making it suitable for a wide range of applications.
Color Page Detection: It accurately identifies color and black & white pages within the PDF document, providing valuable statistics.
Use Cases: This code can be employed in various scenarios, such as document archiving, printing optimization, or content analysis.
⚙️ Languages or Frameworks Used
- Python: The primary programming language used for the project.
- FastAPI: A modern, fast (high-performance) web framework for building APIs with Python.
- PyMuPDF (MuPDF): A lightweight and efficient PDF processing library for Python.
- OpenCV: Used for image analysis and processing.
- Pillow (PIL): Python Imaging Library for working with images.
🌟 How to run
Install all the requirements
Run
pip install -r requirements.txtto install all the requirements.Setup a Virtual Enviroment
- Run this command in your terminal
python -m venv myenv. - Change your directory by
cd myenv/Scriptsif on windows. - Activate the virtual enviroment by running this command
source activate. - Move out from virtual env to your Project Directory by
cd... - Install the packages if not present -
uvicorn,fastapi,fitz,frontend,tools,opencv-python,pillow,python-multipart,PyMuPDF.
pip install uvicorn fastapi fitz frontend tools opencv-python pillow python-multipart PyMuPDF- Run this command in your terminal
Now Just, Run the project
-Now Run the following command -
uvicorn main:app --reload. -Open the localhost link on your browser and put/docsat your endpoint to see the fastapi docs UI.
-Now, Click on POST and then Try it out. -Click on Choose file to select a pdf, which you want to count the number of black and color pages. -Click on Execute.
📺 Demo

Source Code: main.py
from fastapi import FastAPI, UploadFile, File
import fitz
import cv2
from PIL import Image
import numpy as np
import os
app = FastAPI()
@app.post("/")
async def get_pdf(file : UploadFile = File(...)):
#Initializing our variables.
colored_page_count = 0
color_list=[]
black_list=[]
num = 0
black_count = 0
#Getting the file name and then saving it in local.
contents = await file.read()
with open(file.filename, "wb") as f:
f.write(contents)
# Open the PDF file
# Get the full path to the uploaded file
file_path = os.path.join(os.getcwd(), file.filename)
print(file_path)
with fitz.open(file_path) as doc:
print(doc)
# Iterate through the pages
for _, page in enumerate(doc):
# Render the page to an image
pix = page.get_pixmap(alpha=False)
img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
arr = np.array(img)
arr_mean = cv2.mean(arr)
if not (arr_mean[0] == arr_mean[1] == arr_mean[2]):
colored_page_count += 1
num += 1
color_list.append(num)
#print('colored', num)
else:
num += 1
black_count += 1
black_list.append(num)
#print('Black', num)
print("\nColored Pages: ",color_list,"\n")
print("Black & White Pages: ",black_list)
#Close the file
os.remove(file_path)
return {"colored : ":colored_page_count,"Black Count : ":black_count}